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£S) i Abstract: Capture-recapture studies are widely used to obtain information about abundance 

2 (population size or density) of animal populations. A common design is that in which multiple 

3 distinct populations are sampled, and the research objective is modeling variation in population 
^ 4 size N s ; s = 1,2, . . . , S among the populations such as estimating a treatment effect or some other 
t^J- 5 source of variation related to landscape structure. The problem is naturally resolved using 

6 hierarchical models. We provide a Bayesian formulation of such models using data augmentation 

7 which preserves the individual encounter histories in the model and, as such, is amenable to 

a modeling individual effects. We formulate the model by conditioning on the total population size 

9 among all populations. In this case, the abundance model can be formulated as a multinomial 

10 model that allocates individuals among sites. MCMC is easily carried out by the introduction of a 

11 categorical individual effect, gi, which partitions the total population size. The prior distribution 
, 12 for the latent variable g is derived from the model assumed for the population sizes N s . 

13 Key Words: capture-recapture, data augmentation, Dirichlet compound multinomial, hierarchical 

u models, individual covariates, individual heterogeneity, Markov chain Monte Carlo, WinBUGS 

(~ 15 1 Introduction 

in 

^—i i6 Capture-recapture models are widely used in ecology and wildlife management to estimate the size 

17 of animal populations (Williams, Nichols and Conroy 2002). A common situation in many ecological 

is studies concerns the case where the population is divided into spatially, temporally referenced or 

19 otherwise grouped (henceforth stratified) populations. This is frequently characteristic of animal 

20 population studies because spatial replication is often crucial to the scope of inference. For example, 

21 one might establish S experimental units to investigate the influence of some factor or treatment 

22 on animal abundance, and carry-out a capture-recapture study on each unit (Dorazio et al. 2005; 

23 Converse and Royle 2012). In addition, in studies of rare or elusive species that may be difficult to 

24 capture, multiple sample units are necessary to obtain a sufficient sample of individuals in order to 

25 effectively estimate parameters of capture- recapture models (Converse and Royle 2012). 

26 As the number of replicates increases, and especially in the presence of sparse data or low sample 

27 sizes for some of the populations, it becomes increasingly necessary to aggregate information across 

28 replicates using a model in order to estimate these parameters and to improve the precision of 

29 local population size estimators. Moreover, a common objective of animal population studies is to 
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30 develop models relating local population size to explicit spatial or temporal covariates that describe 

31 the stratification structure. In practice, this is often done by obtaining local population-specific 

32 estimates of N and then developing additional models for this collection of estimates, as if they 

33 were data (i.e., "doing statistics on statistics" Link 1999). These two objectives can be addressed 

34 coherently with the use of hierarchical models. In addition, a benefit of the use of a formal model- 

35 based framework for combining data from multiple populations is that it explicitly accommodates 

36 variability among spatial sample units so that valid variance estimates are obtained for estimates 

37 of mean population size across populations, or for predictions on new, unsampled units or under 

38 hypothetical conditions. Models allow for explicit characterization of prediction uncertainty, and 

39 so there is clear utility in devising flexible models for providing robust estimates of variance. 

40 Hierarchical models provide a natural framework for modeling data from studies on multi- 

41 pie populations and for addressing the important inference problems of aggregation, prediction 

42 and variance estimation. In the context of animal population studies based on capture-recapture 

43 methods, we can extend standard capture-recapture models by including a model component that 

44 describes variation in population size (Royle 2004a; Royle et al. 2004; Dorazio et al. 2005; Royle 

45 and Dorazio 2006). For example, let N s be the size of population s where the s = 1, 2 . . . , S popu- 

46 lations are organized spatially in some manner. Regarding the {N s } as latent variables, we might 

47 assume N s ~ Poisson(\ s ) and then focus attention on modeling the parameters X s . Evaluating 

48 factors that affect X s might be the main focus of many studies, but such hierarchical models also 

49 allow us to obtain estimates of the latent variables N s , i.e., the size of specific populations. 

50 We provide general formulations of hierarchical models of abundance from capture-recapture 

51 data by constructing specific classes of prior distributions for N s . We provide a framework for 

52 Bayesian analysis that permits analysis of models in which the dimension parameter space is itself 

53 unknown. This arises in classes of models in which the population size, N, is an unknown parameter, 

54 and there are individual- level effects such as heterogeneity or individual covariates. This variable- 

55 dimension parameter space problem causes considerable difficulty in the analysis of such models 

56 using classical methods of MCMC (Durban and Elston 2005; Royle et al. 2007a; Schofield and 

57 Barker 2011). Motivated by a need for general Bayesian treatment of the problem we consider 

58 analysis by data augmentation (Royle et al. 2007a; Royle and Dorazio 2012) which provides a 

59 framework for Bayesian analysis using a fixed dimension data set constructed in a specific manner. 

60 One problem with analysis of models using data augmentation is, because N is not an explicit 

61 parameter in the model under the reparameterization induced by data augmentation, it is not clear 

62 how to model variation in N among populations, i.e., using hierarchical models, and this has been 

63 suggested as a limitation of data augmentation (Schofield and Barker 2011). In this paper we 

64 resolve this problem by providing a more general formulation of data augmentation for stratified 

65 populations, in which interest is in modeling variation in N among populations. 

66 To adapt data augmentation for hierarchical models involving stratified populations, we first 

67 develop a multinomial parameterization of stratified models that conditions on the "total" popula- 

68 tion size. This yields a multinomial formulation of the model that is suitable for Bayesian analysis 

69 by data augmentation. We also consider a general form of the model using a Dirichlet compound 

70 multinomial model for the population size variables. Data augmentation for such models is eas- 

71 ily implemented in a number of popular MCMC packages such as WinBUGS (Gilks et al. 1994) 

72 and JAGS (Plummer et al. 2009). As such the inference framework is widely accessible, e.g., to 

73 ecologists, and not just to statisticians trained in the development of MCMC algorithms. 
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74 2 Data Augmentation 



75 Data augmentation provides a general approach for analyzing models in which the multinomial 

76 sample size, N, is an unknown quantity (Royle et al. 2007a). When N is unknown, the dimension 

77 of the parameter space is itself a variable and this requires that specialized MCMC algorithms be 

78 used (e.g., reversible jump MCMC) to analyze such models (e.g., Durban and Elston 2005; Schofield 

79 and Barker 2008). Data augmentation effectively reparameterizes models in which N is an unknown 
so parameter into a model in which N is removed as an explicit parameter, by marginalizing over a 
si Bin(M,ip) prior distribution for TV, where M >> N is fixed and ip is the unknown parameter to 

82 estimate. The resulting model for the augmented data can be expressed as a zero-inflated binomial 

83 mixture model and standard Gibbs sampling methods can be applied to the analysis of the model 

84 (Royle and Dorazio 2012). In addition, the model reformulated by data augmentation can be 

85 expressed directly in the BUGS language and therefore implemented in popular software packages 
se such as WinBUGS, OpenBUGS or JAGS. Data augmentation is especially useful for analysis of 
87 models with individual effects, either in the form of covariates or individual level random effects 
ss because the model preserves an individual encounter history formulation of the model that is 

89 conditional on the individual effects. 

90 To introduce data augmentation in the relevant context, suppose that a population of size N 

91 is sampled producing encounter data y, t on n < N individuals. Without loss of generality we 

92 suppose here a simple observation model in which j/j ~ Bmom(K , p) for each individual in the 

93 population where K is the number of samples of the population made under the assumption of 

94 closure to movement and population dynamics. This model is usually referred to as "model Mo" in 

95 the capture-recapture literature. The basic inference problem is to estimate N which is unknown. 

96 Data augmentation is motivated by a specific prior choice for N. For the case of a single 

97 population, we assume that N ~ Binom(M, ip) for some fixed M and ip ~ Unif(0, 1) which implies 

98 that the marginal prior of N is Unif(0, M). In this sense, data augmentation provides a reasonable 

99 specification of a non-informative prior for N. As a conceptual matter, this prior specification 

100 implies a pseudo-population of size M, and a segregation of members of that population into two 

101 sub-sets: (1) "real" individuals, which occur with probability ip, and (2) pseudo-individuals which 

102 are not members of the population of size N, and occur with probability 1 — ip. This form of 

103 the model that arises under data augmentation is convenient for implementation of models with 

104 individual-level effects because we can parameterize the model in terms of a collection of binary 

105 latent variables, say Zi, which are Bernoulli trials with parameter ip such that if Z{ = 1, then the 

106 observations j/j are generated according to the binomial observation model and if z% = then the 

107 observations are fixed zeros with probability 1. As such, under data augmentation, the model is 

108 a zero-inflated form of the known- iV model. In the parameterization of the model based on data 

109 augmentation, the parameter ip takes the place of N although N can easily be estimated directly 
no (see Royle, Dorazio and Link 2007a) or obtained as a prediction made from the MLEs under the 
m model for the augmented model. As a technical matter, the formulation of data augmentation 

112 for the class of models considered here and by Royle et al. (2007a) is consistent with "parameter 

113 expansion" of Liu and Wu (1999) in the sense that, in addition to data augmentation (Tanner and 

114 Wong 1987) an additional parameter is introduced to accommodate the augmented data (Royle 
us and Dorazio 2012). 

116 What we do in the remainder of this paper is develop extensions of data augmentation for models 

117 that apply to a population stratified in some fashion (e.g., spatially or temporally). For the case 
us of S strata having population sizes N = (A"i, N2, ■ ■ ■ , N$), the goal is to develop models describing 



3 



119 variation in the N s variables and provide an analysis framework for these models based on data 

120 augmentation. We emphasize that the N s are latent variables in the context of capture-recapture 

121 models. 

122 3 Multinomial Abundance Models for Capture- Recapture 

123 We suppose these S populations are sampled using some capture-recapture method producing 

124 sample sizes n s and encounter history yj for individual i = 1, 2, . . . , X^s=i n s- As the specific variants 

125 of capture-recapture observation models are not the main focus of this paper, we address only 

126 the most basic model in which encounter histories are reduced to the scalar encounter frequency, 

127 yi ~ Binom(X, p) where necessary. Although we note that treatment of, for example, sampling 

128 occasion effects in capture probabilities can easily be accommodated. Let gi be a covariate (integer- 

129 valued, 1,...,S) indicating the population membership of individual i. This covariate is observed 

130 for the sample of captured individuals but not for individuals that are not captured. 

131 A key idea that we develop shortly is that specific priors for gi are consistent with certain 

132 reasonable priors for the population size variables N s . Then, the data from all populations can 

133 essentially be pooled, and analyzed as data from a single population with the appropriate model 

134 on <7j. The partially observed population membership variable g^ is a categorical type of individual 

135 covariate (Huggins 1989; Alho 1990; Royle 2009). In the analysis of this model using data augmen- 

136 tation that we develop subsequently, the assumption of a model for the collection of abundance 

137 variables N s implies a specific model for the individual covariate gi. That is, data augmentation for 

138 stratified populations is equivalent to an "individual covariate" model with a specific distribution 

139 for the individual covariate. Shortly we will consider two models for gi derived from Poisson and 

140 negative binomial models for N s . 

mi To illustrate this data structure, we suppose that a population comprised of 4 sub-populations 

142 is sampled K = 5 times. Then a plausible data set has the following structure: 

us individual (i) : 1 2 3 4 5 6 7 8 9 10 

144 frequency (y) :1131122411 

145 group (g) :1112333344 

we This data set indicates three individuals were captured in subpopulation 1 (captured 1, 1, and 3 

147 times), a single individual was captured in population 2, four individuals were captured in popula- 

148 tion 3, and two individuals were captured in subpopulation 4. 

149 We suppose some distribution for the subpopulation size parameters, 

N s ~f{N-\ s ) 

150 for s = 1,2, ...,S. We consider specific forms of f(N) below. To model variation in N s as a 

151 function of some covariate, x(s), thought to affect variation in the population sizes, we consider 

152 models of the form: 

log(A s ) = /3o + /3i*z(s). 

153 The general strategy we adopt is to formulate the joint prior distribution for the N s parameters by 

154 conditioning on the total, say Nt = J2 S N s which is the super-population of all individuals alive in 

155 the S populations. We consider multinomial prior distributions for the subpopulation sizes: 

N\Nt ~ Multinom(7r|iV T ) (1) 

156 with specific forms of the cell probabilities 7v s dictated by the choice of f(N). This multinomial 

157 model forms the basis of our data augmentation scheme for multiple populations. 
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158 3.1 Poisson case 

159 We begin with the Poisson case because this model is the model commonly used in ecology for 
wo modeling count data, and so it is natural as a model for the unknown population size parameters. 

161 We assume 

N s ~ Poisson(A s ) (2) 

162 with 

log(A s ) = A, + Pix{s) (3) 

163 where x(s) is some measured attribute for population s. Under this Poisson model, by conditioning 

164 on the total population size over all S populations, the N s variables have a multinomial distribution: 

N = (iVi, . . . ,N S )\{N T = ^N s } ~ Multinom(7r|iV T ). (4) 

s 

165 with multinomial probabilities n s = X s / J2 S ^s- 

166 To devise a data augmentation scheme for this model of population size, we embed the multi- 

167 nomial for {N s } into a multinomial of the same dimension but with larger, fixed sample size. 

168 Specifically, we introduce a latent super-population variable G s which we assume has the desired 

169 Poisson distribution but with scaled mean: G s ~ Poisson(AA s ) where A » 1 where A exists (can 
no be chosen) to ensure that G s is arbitrarily larger than N s . Conditional on the total super-population 

171 size M = J2 S Gs, then G has a multinomial distribution: 

G\M ~ Multinom(M; tt) (5) 

172 where tt s = X s / J2 S ^ which are the same probabilities as for the target multinomial for N. This 

173 multinomial model for the super-population sizes G s is equivalent to the following: 

Qi ~ Categorical(7r) 

174 for <7j; i = 1, 2, . . . , M. Given G or, equivalently, gi, we specify a model for {N s } that differentiates 

175 between "real" and "pseudo-" individuals by a Bernoulli sampling model: 

N s ~ Binom(G s , i/j) 

176 where ip ~ Unif(0, 1). Bernoulli sampling preserves the marginal Poisson assumption (Takemura 

177 1999). That is, N s is Poisson, unconditional on G s and, also, conditional on N T = J2 s ^s, N has 
ns a multinomial with probabilities tt and index Nt- Note also that ~ Binom(M, <j>) which is 

179 consistent with data augmentation applied to total population size Nt- This binomial sampling 

180 model can be represented, equivalently, by the set of Bernoulli variables: 

Zi ~ Bern(^) 

181 for i = 1,2, . . . ,M. 

182 The multinomial construction makes it clear that ip is confounded with exp(/?o). By constructing 

183 the model conditional on the total, we lose information about the intercept (3o, but this is recovered 

184 in the data augmentation parameter ip. One of these parameters has to be fixed. We can set /3o = 

185 or else we can fix ip. The constraint can be specified by noting that, under the binomial data 
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186 augmentation model E[Nt] = ipM and, under the Poisson model, E[Nt] = ^ s exp(/3o + (3ix(s)) 

187 and so we can set 

i> = ^2exp(P + Pix{s)). 

s 

188 The equivalence of ijj and /3q can be thought of in terms of pooling data from the different sub- 

189 populations. In a model with no covariates, we could pool all of the data and estimate a single 
wo parameter if) or /3q but not both. In this sense, pooling data from multiple spatial samples is 

191 justifiable (in terms of sufficiency arguments) under a Poisson assumption on local abundance 

192 (which was noted by Royle 2004b; Royle and Dorazio 2008, sec. 5.5.1). 

193 3.2 Implementation 

194 By introducing the latent G s structure, and the Bernoulli sampling of N s , the model is equivalently 

195 represented by the latent variable pair (gi, Zi) where g% is categorical with prior probabilities tt s and 

196 Zi ~ Bern(ip). In particular, the multinomial assumption for the latent variables G s is formulated 

197 in terms of "group membership" for each individual in the super-population of size M according 

198 to: 

gi ~ Categorical (ir) 

199 with 7r = (7ri, . . . ,tts) and ir s = X s /(Yls A s ). Note that aggregating these M categorical variables 

200 yields a set of multinomial variables consistent with Eq. [ij That is, define G\ = J2iLiH9i = l)i 

201 G*2 = ^2f = rl(9i = 2), etc., where IQ is the indicator function. The binomial sampling from the 

202 super-population, Nt ~ Binom(M, ip) can be described at the level of the individual also, by 

203 introducing the binary variables z\ , . . . , zm such that 

Zi ~ Bern(^) 

204 where ip is constrained as noted in the previous section. We implement this individual-level formu- 

205 lation of the model in BUGS in Panel [TJ 

206 A second implementation of the model is suggested by working from Eq. Q - we can marginalize 

207 Nt over the prior Nt ~ Binom(M, </>) to see that the (S + 1) x 1 vector (N±, . . . , Ns, Ns+i) has, 

208 conditional on M, a multinomial distribution with cell probabilities 7r+ = 7r s ^ for s = 1,2, ... ,S 

209 and t^s + i = (1 — ip) f° r the last cell which corresponds to individuals of the super-population that 

210 are not members of any of the S populations that were subject to sampling. Thus, 

N|M ~ Multinom(7r + ). 

211 where the superscript + here indicates that it is a larger version of tv from [5] In this case, 

gi ~ Categorical(7r + ) for i = 1, . . . , M (6) 

212 3.3 Negative Binomial Model 

213 It is natural to consider abundance models that exhibit over-dispersion relative to the Poisson 

214 model. Here we generalize the model formulation of the previous section for that purposes. We 

215 introduce a collection of iid latent super-population size variables G s which have a negative binomial 

216 distribution and we apply the binomial sampling model of data augmentation to these super- 

217 population sizes. Then, conditioning on the total super-population size M = Y2 S C*s, this produces 
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218 the Dirichlet compound multinomial (DCM) distribution for the vector (Gi, . . . ,Gs) (Takemura 

219 1999). Equivalently, the DCM arises by placing a Dirichlet(o:) prior on the multinomial probability 

220 vector 7V. 

221 We take the approach of a Gamma-poisson mixture (following Takemura (1999)) in which G s 

222 has a Poisson distribution conditional on \r] s where rj s ~ Gamma(a, (3) and the G s are mutually 

223 independent. Marginalizing r) s from the conditional Poisson model produces the negative binomial 

224 having pmf 

T(a + G s ) (\pf° 
{Us> T(a)G s \ (l + \(3)Gs+a 

225 so the mean A is preserved as a parameter in this formulation: the expected value of G s is E[G S ] = 

226 a/3 A. Therefore we require a constraint among the parameters a and f3. In our analysis below we 

227 set f3 = 1. In fact, the total M = ^ s G s is the sufficient statistic for (3 in the Gamma-Poisson 

228 mixture^] Therefore, the two free parameters to estimate are a and A. Preserving A in the model is 

229 convenient for modeling covariates - which we can do in the usual way: 

log(A s ) = /3o + Pix(s) 

230 for some covariate x(s). 

231 To apply data augmentation to this model, we relate the population size variables N s to G s by 

232 the binomial sampling model: 

N s ~ Binom(G s ,V0 

233 which allows us to implement the model using iid Bernoulli trials Zi ~ Bern(^) and an individual 

234 covariate g, t ~ Categorical(7r) as before. In this case, we construct the cell probabilities ir s according 

235 tO 

n s = ^ r- 

22s Vs^s 

236 where rj s is the gamma noise term, and A s represents the fixed effects. Implementation of this 

237 model in the various BUGS engines poses no special difficulty, see Panel [2] 

238 Marginally G s is negative binomial and under this Bernoulli sampling model the N s are also 

239 marginally independent negative binomial models (Takemura 1999, Appendix A) or, when condi- 

240 tioned on the total Nt = Y^s is Dirichlet compound multinomial (DCM). 



241 4 Capture-recapture with individual effects 

242 The individuablevel formulations of the model have utility in the analysis of general capture- 

243 recapture models because it allows for the development of individual-level models for the encounter 

244 probability independent of the models describing variation in N among populations. Models that 

245 affect abundance determine the form of the cell probabilities 7r whereas models of encounter prob- 

246 ability are parameterized directly at the level of individual encounters. For example, if m are the 

247 individual encounter frequencies, then the most basic model is: 

Hi ~ B'mom(K;pzi) 

248 so that if Zi = then y = with probability 1. In general we can express the model so that p varies 

249 by individual, sample occasion, stratum or according to some specific covariate. Therefore, a more 

An ecological example of a DCM application is Link and Sauer (1997) who use the DCM for modeling nuisance 
variation in the BBS. In that case, r] s is the "observer effect". 
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general formulation conditions on the stratum membership variable g: yi y k\di ~ Binom(l£';pifc(<7i);Z}) 
with additional individual- level model structure imposed on Pi t k(di)- The simplicity of the obser- 
vation model is retained regardless of the complexity of models for p, and this yields considerable 
flexibility in model development and, importantly, separates the development of such models (for 
encounter probability) from the development of models describing variation in N among subpopu- 
lations. 

We provide an implementation of the capture- recapture model for stratified populations in Panel 
[TJ This is for the Poisson model in which 

log(A s ) = /3o + /3ix(s). 

The full specification of the model shown in Panel [T] for the Poisson model. The BUGS model 
specification for the model in which N s have the joint Dirichlet compound multinomial distribution 
is shown in Panel [2] Note in the 2nd case, we demonstrate the alternative formulation in which ijj 
is a free parameter with /3q constrained to 0. 

The Poisson model shown in Panel [T] is summarized by the following elements: 

Observation model: 

Ui\zi ~ Bmom(zip, K) 
Process model: 

Zi ~ Bern(^) 

gi ~ Categorical(7r) 



263 where ir s = A S /^A S . The prior distributions are: 

p ~ Unif(0,l) 
/3 ,Pi ~ Normal(0,.l) 

s 

264 Note the BUGS language uses a parameterization of the normal distribution in terms of the t = 

265 1/(T 2 . 



266 5 Application: Effect of Forest Management on Dear Mouse Pop- 

267 ulations 

268 Here we consider a typical problem which motivates the need for hierarchical models of the type con- 

269 sidered in this paper. The data come from Converse et al. (2006), and are based on a mark-recapture 

270 study of small mammals examining the ecological impacts of forest thinning and prescribed burn- 

271 ing treatments on 13 study areas across the US (P. Weatherspoon and J. Mclver, United States 

272 Department of Agriculture Forest Service, unpublished report). Here we focus on data collected at 

273 the Southwest Plateau Study Area, near Flagstaff, Arizona. The Southwest Plateau Study Area 

274 was composed of 3 study sites (blocks), each of which was in turn composed of 4 experimental 

275 units. Live-trapping grids were used to catch small mammals in each of these 12 experimental 

276 units in each year from 2000 to 2003. Twelve experimental units by 4 years produced 48 groups. 
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model { 

# This version constrains psi with 

# the intercept parameter 
p~ dunif(0,l) 
bO~dnorm(0, . 1) 
bl~dnorm(0, . 1) 

psi<- sum(lam[])/M 

for(s in 1:S){ 

log(lam[s]) <- bO + bl*x[s] 
gprobs[s]<- lam[s] /sum(lam[l : S] ) 

} 

for(i in 1:M){ 

g[i] ~ dcat (gprobs [] ) 

z[i] ~ dbern(psi) 
y[i]~ dbin(mu[i] ,K) 
mu[i] <- z [i] *p 

} 

N <- sum(z[l:M]) 



Panel 1: BUGS model specification for a capture-recapture model with constant encounter proba- 
bility and Poisson subpopulation sizes, N s , with mean depending on a single covariate x[s] . 



model { 

p~ dunif(0,l) 
b0~ dnorm(0, . 1) 
bl~ dnorm(0, . 1) 
a~dunif (0,1000) 

b<-l # set to 1 — trades-off with "psi' 

psi" dunif(0,l) # instead, psi is estimated here 



for(s in 1:S){ 

eta[s] ~ dgamma(a,b) 

lam[s]<- bO + bl*x[s] 

alpha [s]<- eta[s] *lam[s] 

gprobs [s] <- alpha [s] / sum (alpha [] ) 

} 

ford in 1:M){ 

g[i] ~ dcat (gprobs [] ) 
z[i] ~ dbern(psi) 
mu[i] <- z [i] *p 
y[i] ~ dbin(mu[i] ,K) 

} 

N <- sum(z[l:M]) 



Panel 2: BUGS model specification for a capture-recapture model with constant encounter proba- 
bility and Dirichlet compound-multinomial population sizes. 
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277 Thus, the sub-populations (strata) in our analysis are indexed by an aggregate "space x time" 

278 index k = 1,2, ... ,48. Before data were collected in 2003, thinning treatments were applied to 2 

279 experimental units at each study site. The deer mouse (Peromyscus maniculatus) combined with 

280 a small number of individuals of the closely related brush mouse (P boylii) create the dataset we 

281 analyze here (489 individuals; range = 0-45 per experimental unit per year). Live-trapping occurred 

282 in the late summer on a grid with 25-m spacing (50-m spacing in the first year). Traps were baited 

283 with grain and checked mornings and afternoons for 5 days to yield 10 trapping occasions (in 2000, 

284 6 of the experimental units were trapped for only 9 occasions). Captured individuals received 2 

285 uniquely- numbered ear tags. 

286 The main objective is to evaluate the effect of treatment (forest thinning) on population size. 

287 Importantly, the response variable N s is a latent variable and cannot be measured directly, and 

288 thus hierarchical models are necessary to integrate data from the 48 replicates and to explicitly 

289 incorporate the biological hypothesis (effect of thinning) into the model for abundance. 

290 In our analysis, following Converse et al. (2006), we fitted a model that contains the treatment 

291 effect (1 parameter), and fixed block (2 parameters), and year (3 parameters) effects. We param- 

292 eterize the two block effects and year effects using dummy variables Bl and B2 and YR1, YR2 and 

293 YR3. The population size model is specified by: 

log(A s ) = fa + fcfrt a + /3 2 B1 S + /3 3 B2, + /3 4 YR1 S + /3 5 YR2 S + /3 6 YR3 S 

294 and, for the group membership variables gi, 

g. L ~ Categorical(-7r) 

295 where 

296 For the data augmentation variables we have 

Zi ~ Bern(^) 

297 and the observation model is 

y ik ~ Bem(p ik z ik ). 

298 We consider a detection probability model that contains a behavioral response according to: 

logit(pa.) = a + aiy iifc _i 

299 Here, ot\ affects the increase or decrease in encounter probability for previously captured individuals. 

300 This model was fitted in WinBUGS using a modification of that shown in Panel [TJ 



5.1 Goodness-of-Fit 

We evaluated the goodness-of-fit of the model described previously using a Bayesian p-value (Gel- 
man et al. 1996). The statistic was based on the adequacy of the model at explaining the variation 
in population-specific sample sizes of observed individuals, n s . For each posterior sample, m, we 
computed a fit statistic based on Pearson residuals according to 
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where n s is the number of individuals encountered in population s, tit = ^2 s n s , and tt s is the 
probability that an individual in the super-population of size Nt belongs to population s. Note 
that 7r™ depends on model parameters and therefore for each iteration of the MCMC algorithm, 
its value changes. The fit statistic computed from the observed data is compared to that obtained 
by samples from the posterior predictive distribution of the data. That is, for each iteration m of 
the MCMC algorithm, we obtain a draw of n™, a prediction of the number of individuals captured 
for population s and n™ is the total sample size of the simulated populations. The fit statistic is 
then computed using the simulated data: 
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In™ - n^ir™} 2 



We believe that this fit statistic should emphasize the fit (or lack therefore) of the spatial 
component of the model which is our main interest here. The scatter of X & s vs X s i m for the Poisson 
model fitted to the Peromyscus data is shown in the left panel of Fig. [TJ The p-value is 0.000 for 
that case indicating a clear lack of fit. Conversely, the fit under the Dirichlet compound- multinomial 
base model (with treatment, block and year effects) (right panel) yields a p-value of 0.558 indicating 
that this model provides an adequate fit to the data. We assessed convergence by looking at the 
Brooks-Gelman-Rubin "R-hat" statistics (Gelman and Rubin 1992; Brooks and Gelman 1997) for 
the structural parameters of the model. For both Poisson and DCM models, those were all less 
than 1 . 1 (see Table [T]) and so we think the MCMC output is adequate for characterizing features 
of the posterior distribution. Other posterior summaries are provided in Table [T] To summarize 
the salient results, the treatment effect is somewhat strong under both models, and comparable 
in magnitude, but less precise under the DCM model as we might expect, because under this 
model we expect more background variation in population sizes N s . How different is the estimated 
distribution of the population among the 48 groups? We plot the estimated population membership 
probabilities, tt s , under the two models in Fig. [2j In general, there is less shrinkage to the mean 
under the DCM model - we see one site where the Poisson model over-predicts substantially and 
2 moderate under-predictions but otherwise most of the population proportions are similar. Both 
models indicate a very strong positive behavioral response, indicating strong trap-happiness. Total 
population size is slightly higher under the DCM model (posterior mean: roughly 710 vs. 695) and 
the posterior is only slightly more diffuse under the DCM model. 



334 6 Discussion 

335 Understanding variation in population size, N, in structured populations is a fundamental inter- 

336 est in animal ecology. A common situation is that in which populations are stratified (spatially, 

337 temporally, etc.). Questions related to how N responds to treatments or landscape variation are 

338 routine in ecology and natural resource studies. Unfortunately, N is not observable in practical field 

339 situations. As a result, many methods have been devised for estimating N from individual level 

340 encounter history data (e.g., Williams et al. 2002). A common approach to studying variation in 

341 the size, N, among populations is to obtain estimates of N using such capture-recapture methods, 

342 and then regard such estimates as "data" in a second-stage procedure (Converse and Royle 2012). 

343 A more contemporary approach to modeling variation in N is to use hierarchical models (Royle and 

344 Dorazio 2006, 2008; Link and Barker 2009; Kery and Schaub 2011) in which the observation model 

345 is formulated conditional on N, and the model is extended to include an additional component 

346 describing variation in N. 
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Figure 1: Scatter of fit statistic for observed data vs. posterior simulated data sets under Poisson 
(left panel) and Dirichlet compound multinomial models (right panel). 
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Table 1: Posterior summaries of multinomial and Dirichlet compound multinomial hierarchical 
capture-recapture models fitted to 48 populations of Microtus. The two models correspond to 
the multinomial abundance model which arises under a Poisson assumption, and the Dirichlet 
compound multinomial model which arises under a negative binomial assumption for abundance. 

Multinomial Model 



parameter 


mean 


sd 


2.5% 


50% 


97.5% 


Rhat 


I U 


1.700 


0.148 


1.340 


1.704 


1.970 


1 


.011 


ri. 


0.835 


0.161 


0.522 


0.834 


1.166 


1 


.001 


block[2] 


0.872 


0.132 


0.632 


0.865 


1.147 


1 


.001 


block 3 

L J 


1.080 


0.130 


0.844 


1.077 


1.342 


1 


.001 


vear[2l 


-0.327 


0.150 


-0.626 


-0.326 


-0.026 


1 


.007 


VP9T" r^l 


0.324 


0.132 


0.070 


0.326 


58fi 


1 


.014 


year [4] 


0.118 


0.166 


-0.222 


0.122 


0.427 


1 


.006 


«o (intercept) 


-2.038 


0.133 


-2.264 


-2.044 


-1.783 


1 


.019 


a.\ (trap happy) 


0.495 


0.142 


0.214 


0.500 


0.755 


1 


.029 




0.637 


0.045 


0.558 


0.636 


0.728 


1 


.014 


N 


695.064 


47.019 


616.000 


693.000 


788.000 


1 


.016 




Dirichlet compound 


. multinomial model 








parameter 


mean 


sd 


2.5% 


50% 


97.5% 


Rhat 


A) 


0.439 


0.433 


0.151 


0.379 


1.385 


1 


.020 


/3i 


0.781 


0.338 


0.559 


0.790 


1.432 


1 


.010 


block[2] 


0.832 


0.226 


0.680 


0.835 


1.273 


1 


.008 


block[3] 


1.012 


0.225 


0.869 


1.018 


1.446 


1 


.012 


year [2] 


-0.340 


0.256 


-0.514 


-0.340 


0.161 


1 


.006 


year [3] 


0.225 


0.249 


0.067 


0.224 


0.710 


1 


.006 


year [4] 


0.114 


0.296 


-0.087 


0.115 


0.694 


1 


.009 


«o (intercept) 


-2.078 


0.146 


-2.175 


-2.070 


-1.798 


1 


.007 


cci (trap happy) 


0.540 


0.156 


0.431 


0.532 


0.854 


1 


.007 




0.652 


0.054 


0.615 


0.645 


0.771 


1 


.004 


N 


710.786 


56.170 


671.000 


703.000 


839.000 


1 


.005 
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Figure 2: Posterior means of subpopulation probabilities, n s , under Multinomial (Poisson) and 
Dirichlet compound multinomial (negative binomial) models. 
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347 In principle, Bayesian analysis of stratified abundance models is straightforward without data 

348 augmentation, for certain classes of models in which parameters do not vary at the level of indi- 

349 vidual. In such cases, encounter histories can be aggregated into frequencies for which the joint 

350 distribution is a simple multinomial with index N s . Various prior distributions can be specified for 

351 N s , and the resulting model analyzed directly using standard Bayesian or frequentist methods. One 

352 example is given in Royle et al. (2007b) and similar models can be found in, for examples, Royle et 

353 al. (2004a) using a distance-sampling model, and Dorazio et al. (2005) for a removal type of sam- 

354 pling protocol. Certain classes of multinomial-mixture models can be fitted by likelihood methods 

355 in the software package unmarked (Fiske and Chandler 2012) using the functions multinomPois 

356 and gmultmix (see Chandler et al. 2011). However, direct analysis of such models by MCMC is 

357 more difficult when they contain individual-level effects, and that motivates the need for methods 

358 based on data augmentation as developed in our paper. Another approach suggested for analyzing 

359 such models is reversible jump MCMC (RJMCMC; e.g., Schofield and Barker 2008, 2011). Unlike 

360 the data augmentation approach, RJMCMC retains the collection of population size parameters 

361 N s in the model and relies on a specialized MCMC algorithm to move about the state-space of N s 

362 while reconciling the dimensionality of the remaining parameters. Conversely, in analyzing these 

363 models by data augmentation, as we have done in this paper, the N s parameters are removed from 

364 the model by marginalizing over the next level in the hierarchical model. In the present case, the 

365 joint prior distribution of N|iV~T is marginalized over the prior for Nt- The advantage of removing 

366 the N s parameters from the model is that the dimension of the parameter space is fixed and thus 

367 standard Gibbs Sampling algorithms can be used to update model parameters. It has been sug- 

368 gested that a deficiency with the use of data augmentation to analyze capture-recapture models 

369 is that because N is not retained as an explicit parameter, alternative priors or models could not 

370 be imposed on population size. Our work here resolves that problem in some generality. In par- 

371 ticular, considering multinomial or Dirichlet compound-multinomial models yields great flexibility 

372 in modeling variation in N s among subpopulations using capture-recapture observations models of 

373 arbitrary complexity at the individual level. 

374 A key idea of our models is conditioning on the total population size among the distinct pop- 

375 ulations - the super-population size of individuals alive in all of the populations combined. This 

376 is a similar concept to that employed in the Crosbie-Manly-Schwarz-Arnason (CMSA) formulation 

377 (Schwarz and Arnason 1996) of the Jolly-Seber model which involves a temporal sequence of pop- 

378 ulations N t ;t = 1,2, ...,T. However, in that case, parameterization of the population sizes N t 

379 is based on a Markovian survival/recruitment model as the sub-populations may share the same 

380 individuals, but with a state-variable (alive, or not) which evolves over time due to recruitment 

381 and mortality. A formulation of the CMSA model using data augmentation was given in Royle and 

382 Dorazio (2008, ch. 10), and Kery and Schaub (2011, ch. 10). In the context of the CMSA model, 

383 the gi variable is "period of entry" into the population. Our model seems relevant to formulating 

384 open population models of the CMSA variety. In particular, with gi being the period of entry into 

385 the population, we could suppose that 

gi ~ Categorical(7r) 

386 where nt = At/(^ t At), and therefore we can model factors that affect recruitment directly on 

387 At, Our analysis therefore suggests that the CMSA formulation of the model implies a Poisson 

388 recruitment model, where the total number of recruits in year t, R t , is Poisson with mean X t . 

389 Alternatively, the DCM model could be used as a model for recruitment to accommodate variability 

390 across years. 
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391 The other key idea of our model development here is that we embed this "super-population" 

392 in the CMSA sense into yet a larger population - a super-super-population - of size M, which 

393 facilitates the use of data augmentation (Royle et al. 2007b). Data augmentation produces an 

394 individual-level formulation of models for stratified populations in terms of the group membership 

395 variable gi, which is a categorical type of individual covariate as in Huggins (1989), Alho (1990), and 

396 Royle (2009) - gi is observed for each individual in the sample but not for unobserved individuals. 

397 In particular, the assumption of a model for N s implies a specific model for the individual covariate 

398 gi. Therefore, data augmentation for spatially stratified populations is equivalent to an "individual 

399 covariate" model with a specific distribution for the individual covariate. 

400 We provided an illustration of our model formulation to a small-mammal trapping study wherein 

401 the main objective was to evaluate a treatment effect on population size. Historically such analyses 

402 have been carried out by using closed population models to obtain estimates of N for each grid and 

403 then treat the estimates as data in a second-stage regression procedure (e.g., Converse et al. 2006). 

404 Our analysis of these data involved a behavioral response model, which is almost always essential in 

405 small-mammal trapping studies. Thus our model was a multi-population version of this important 

406 type of closed population model. Although we considered a single specific encounter probability 

407 model ("model Mt>") we could as well consider any other model containing individual- level effects, 

408 including alternative behavioral response formulations (Yang and Chao 2005), individual hetero- 

409 geneity (Pledger 2000; Dorazio and Royle 2003) or spatially explicit capture-recapture models 

410 (Borchers and Efford 2008; Royle and Young 2008). 

411 The method is also relevant to other problems in which abundance is naturally stratified. A 

412 common situation in wildlife surveys involves sampling groups of individuals (Royle 2008; Royle 

413 2009) , such as flocks of birds, herds of ungulates, or family groups of marine mammals. In such cases, 
4u encounters of individuals are not independent of one another and it is important to accommodate 

415 that in models of the encounter process. Let Xi be the size of observation i and suppose Xi ~ 

416 Poisson(A). Then, the number of groups of size s is Yli^{ x i = s} = N s and, conditional on the 

417 total Nt = Yls-^s, the vector Ni,...,Ns has a multinomial distribution with sample size Nt 

418 and probabilities ir s = A S /^A S as before. Thus, we can apply data augmentation to this case 

419 directly. Naturally, the encounter rate of groups should depend on the size of the group which can 

420 be modeled in a number of ways, e.g., we can define 

logit(p s ) = a + ai(s - 1) 

421 or similar. As another example, consider the case of a single sex-stratified capture-recapture model. 

422 In this case, S = 2, and the partially latent variable g equates to sex, and has two possible values 

423 (male/female). Such models have been fitted previously by data augmentation (Gardner et al. 

424 2010; Mollet et al. 2012). 
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